{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Chris Holden (ceholden@gmail.com) - https://github.com/ceholden\n",
    "\n",
    "\n",
    "Chapter 1: Exploring the GDALDataset class\n",
    "=========================================="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "One of the most fundamental components of the GDAL raster library is a \"[class](https://en.wikipedia.org/wiki/Class_(computer_programming))\", or an object, that stores all of the information one might want about a raster image. This class, the `GDALDataset`, combines the information about a raster file with a few actions one might want to perform with a raster image, like reading from an image. The information stored by a class is generally referred to as \"[properties](https://en.wikipedia.org/wiki/Property_(programming))\" or attributes and the actions a class can perform are called \"[methods](https://en.wikipedia.org/wiki/Method_(computer_programming))\".\n",
    "\n",
    "If you're coming from another language and want an overview of object oriented programming in Python, see the [Python tutorial on classes](https://docs.python.org/3/tutorial/classes.html) or the [LearnPython class tutorial](http://www.learnpython.org/en/Classes_and_Objects).\n",
    "\n",
    "For those wishing for a verbose, full reference to the the GDAL \"Application Programming Interface\" ([API](http://en.wikipedia.org/wiki/Application_programming_interface)), you can find the C and Python API here:\n",
    "\n",
    "- [C API](http://gdal.org/python/osgeo.gdal.Dataset-class.html)\n",
    "- [Python API](http://gdal.org/python/osgeo.gdal.Dataset-class.html)\n",
    "\n",
    "\n",
    "Some class methods include:\n",
    "\n",
    "- `GetDriver`\n",
    "- `GetRasterBand`\n",
    "- `GetGeoTransform`\n",
    "- `GetProjection`\n",
    "- `GetSubDatasets`\n",
    "\n",
    "These class methods are so called \"getter\" methods that allow you to access class attributes (remember: class attributes are just variables that belong to the class). When you call the class method, `GetDriver`, the GDAL dataset will return the image format driver (e.g., ENVI driver, GeoTIFF driver, HDF driver) responsibile for handling the input and output operations for this raster file format. Similarly, the `GetGeoTransform` method will the transformation that can be used to translate between pixel coordinates and projection coordinates.\n",
    "\n",
    "> Note: the \"getter\" and \"setter\" class methods for accessing and settting class properties is not \"Pythonic\" - these methods exist because the API was originally written for C++ where such methods are normal.\n",
    "    \n",
    "Another suite of class methods allow you to set class attributes. These include:\n",
    "\n",
    "- `SetGeoTransform`\n",
    "- `SetProjection`\n",
    "\n",
    "which allow you to modify the geographic projection and location of the image."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Module import in Python\n",
    "\n",
    "Now that we've seen some of how the GDALDataset object encapsulates many of the ideas relevant to the concept of a raster image, let's see how we can implement these ideas in Python.\n",
    "\n",
    "Before we can get started, we need to tell Python that we will be using functions, classes, and variables from the GDAL Python package. The technical wording for this is that we need to import the GDAL module into our [namespace](http://en.wikipedia.org/wiki/Namespace) (see Python's documentation on the `module` system [here](https://docs.python.org/2/tutorial/modules.html)).\n",
    "\n",
    "We will do this using some `import` statements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GDAL's version is: 2.0.0\n",
      "<module 'osgeo.gdal' from '/home/ceholden/conda/envs/open-geo-tutorial/lib/python3.4/site-packages/osgeo/gdal.py'>\n"
     ]
    }
   ],
   "source": [
    "# Import the Python 3 print function\n",
    "from __future__ import print_function\n",
    "\n",
    "# Import the \"gdal\" submodule from within the \"osgeo\" module\n",
    "from osgeo import gdal\n",
    "\n",
    "# We can check which version we're running by printing the \"__version__\" variable\n",
    "print(\"GDAL's version is: \" + gdal.__version__)\n",
    "print(gdal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we import the `gdal` submodule, Python will know where to look on our system for the code that implements the GDAL API. When we want to access classes, variables, or functions within the `gdal` submodule, we will need to reference the full path, which includes the `gdal` reference:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    },
    {
     "ename": "NameError",
     "evalue": "name 'GDT_Byte' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-6-cb75e61c602f>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgdal\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mGDT_Byte\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      7\u001b[0m \u001b[1;31m# Doesn't work\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mGDT_Byte\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m: name 'GDT_Byte' is not defined"
     ]
    }
   ],
   "source": [
    "# Let's print out the value of the GDAL Byte data type (GDT_Byte)\n",
    "#     the number printed out is how GDAL keeps track of the various data types\n",
    "#     this variable, which has a fixed numeric value, is what's called an enumerated type, or enum\n",
    "\n",
    "# Works\n",
    "print(gdal.GDT_Byte)\n",
    "# Doesn't work\n",
    "print(GDT_Byte)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The datatype `GDT_Byte` doesn't exist in the *global* namespace. We need to tell Python where to look for it to find it.\n",
    "\n",
    "With some basic explanation of Python's namespace setup and how it applies to GDAL out of the way, let's get into some examples:\n",
    "\n",
    "### Examples\n",
    "#### Open an image\n",
    "When we open an image in GDAL we create a GDALDataset object. As the name would suggest, we can open an image with the \"Open\" function within `gdal`.\n",
    "\n",
    "We will use an example image provided in this repository for this chapter. This image is a subset of a Landsat 7 image containing the 7 bands on this sensor rearranged in order of wavelength (e.g., Landsat 7's second SWIR channel comes before thermal channel in our stack). The last band in this image is a cloud and cloud shadow mask from Fmask."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Open a GDAL dataset\n",
    "dataset = gdal.Open('../../example/LE70220491999322EDC01_stack.gtif', gdal.GA_ReadOnly)\n",
    "\n",
    "print(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have this dataset open, let's explore some of its capabilities.\n",
    "\n",
    "#### Image attributes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# How many bands does this image have?\n",
    "num_bands = dataset.RasterCount\n",
    "print('Number of bands in image: {n}\\n'.format(n=num_bands))\n",
    "\n",
    "# How many rows and columns?\n",
    "rows = dataset.RasterYSize\n",
    "cols = dataset.RasterXSize\n",
    "print('Image size is: {r} rows x {c} columns\\n'.format(r=rows, c=cols))\n",
    "\n",
    "# Does the raster have a description or metadata?\n",
    "desc = dataset.GetDescription()\n",
    "metadata = dataset.GetMetadata()\n",
    "\n",
    "print('Raster description: {desc}'.format(desc=desc))\n",
    "print('Raster metadata:')\n",
    "print(metadata)\n",
    "print('\\n')\n",
    "\n",
    "# What driver was used to open the raster?\n",
    "driver = dataset.GetDriver()\n",
    "print('Raster driver: {d}\\n'.format(d=driver.ShortName))\n",
    "\n",
    "# What is the raster's projection?\n",
    "proj = dataset.GetProjection()\n",
    "print('Image projection:')\n",
    "print(proj + '\\n')\n",
    "\n",
    "# What is the raster's \"geo-transform\"\n",
    "gt = dataset.GetGeoTransform()\n",
    "print('Image geo-transform: {gt}\\n'.format(gt=gt))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first few pieces of information we obtained are fairly straightforward - the raster size, the number of bands, a description, some metadata, and the raster's file format.\n",
    "\n",
    "The image's projection is formatted in what's known as \"Well Known Text\". For more information on specific projections and for format conversions among projection description formats (e.g., proj4 string, WKT, ESRI WKT, JSON, etc.) see [Spatial Reference](http://www.spatialreference.org).\n",
    "\n",
    "The last piece of information we accessed is something called a \"geotransform\". This set of 6 numbers provides all the information required to and from transform pixel and projected coordinates. In this example, the first number (462405) and the fourth number (1741815) are the top left of the upper left pixel of the raster. The pixel size in x and y dimensions of the raster is listed as the second (30) and the sixth (-30) numbers. Since our raster is north up oriented, the third and fifth numbers are 0. For more information on the GDAL data model, [visit this web page](http://www.gdal.org/gdal_datamodel.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Image raster bands\n",
    "\n",
    "The GDALDataset object we created contains a lot of useful information but it is not directly used to read in the raster image. Instead we will need to access each of the raster's bands individually using the method `GetRasterBand`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Open the blue band in our image\n",
    "blue = dataset.GetRasterBand(1)\n",
    "\n",
    "print(blue)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Following our guide of the GDALDataset, let's explore some of the attributes and methods of the GDALRasterBand:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# What is the band's datatype?\n",
    "datatype = blue.DataType\n",
    "print('Band datatype: {dt}'.format(dt=blue.DataType))\n",
    "\n",
    "# If you recall from our discussion of enumerated types, this \"3\" we printed has a more useful definition for us to use\n",
    "datatype_name = gdal.GetDataTypeName(blue.DataType)\n",
    "print('Band datatype: {dt}'.format(dt=datatype_name))\n",
    "\n",
    "# We can also ask how much space does this datatype take up\n",
    "bytes = gdal.GetDataTypeSize(blue.DataType)\n",
    "print('Band datatype size: {b} bytes\\n'.format(b=bytes))\n",
    "\n",
    "# How about some band statistics?\n",
    "band_max, band_min, band_mean, band_stddev = blue.GetStatistics(0, 1)\n",
    "print('Band range: {minimum} - {maximum}'.format(maximum=band_max,\n",
    "                                                 minimum=band_min))\n",
    "print('Band mean, stddev: {m}, {s}\\n'.format(m=band_mean, s=band_stddev))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we didn't need to read the image into Python's memory to calculate these statistics - GDAL did all of this for us.\n",
    "\n",
    "For most applications, however, we will need to use GDAL to read our raster bands into memory. When we load our raster band into memory we will read it into a [NumPy](http://www.numpy.org/) 2 dimensional array. NumPy is, [\"the fundamental package for scientific computing with Python\"](http://www.numpy.org/), because it allows us to represent our data in a very memory efficient way.\n",
    "\n",
    "NumPy arrays are the cornerstone or building block of the rest of the Scientific Python suite of software. Get familiar with them:\n",
    "\n",
    "+ [NumPy for MATLAB users](http://wiki.scipy.org/NumPy_for_Matlab_Users)\n",
    "+ [NumPy tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)\n",
    "+ [NumPy API reference manual](http://docs.scipy.org/doc/numpy/reference/)\n",
    "\n",
    "Just as we made the routines and data types from GDAL available to us using `import`, we'll load up NumPy. When we import NumPy, we'll also give it an alias so that we don't have to type `numpy` every time we want to use it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# No alias\n",
    "import numpy\n",
    "print(numpy.__version__)\n",
    "\n",
    "# Alias or rename to \"np\" -- a very common practice\n",
    "import numpy as np\n",
    "print(np.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to read our band into one of these wonderful `np.array` objects, we will use the `ReadAsArray` method from our `GDALRasterBand` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "help(blue.ReadAsArray)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method `ReadAsArray` takes arguments that allow us to specify a subset of the raster band image using X and Y offsets and sizes. Remember this ability when you want to process large images or are working with a limited amount of memory. In these circumstances, you will run out of memory if you read the entire dataset in at once. Instead, read in a block of some number of columns and rows at one time, perform your computation and store your output, and then chunk through the rest of the image.\n",
    "\n",
    "For now, we'll just read in the entire image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "blue_data = blue.ReadAsArray()\n",
    "\n",
    "print(blue_data)\n",
    "print()\n",
    "print('Blue band mean is: {m}'.format(m=blue_data.mean()))\n",
    "print('Size is: {sz}'.format(sz=blue_data.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our data read into a NumPy array, we can print it to console and even perform statistics on it. In addition to helping us store massive amounts of data efficiently, NumPy will help us with some basic linear algebra, numerical operations, and summary statistics.\n",
    "\n",
    "Let's say we want to read all of our bands into one 3 dimensional (nrow x ncol x nband) dataset. We will begin by initializing a 3 dimensional array. Next, we will loop over all bands in our raster image dataset and read them into our newly allocated 3 dimensional array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Initialize a 3d array -- use the size properties of our image for portability!\n",
    "image = np.zeros((dataset.RasterYSize, dataset.RasterXSize, dataset.RasterCount))\n",
    "\n",
    "# Loop over all bands in dataset\n",
    "for b in range(dataset.RasterCount):\n",
    "    # Remember, GDAL index is on 1, but Python is on 0 -- so we add 1 for our GDAL calls\n",
    "    band = dataset.GetRasterBand(b + 1)\n",
    "    \n",
    "    # Read in the band's data into the third dimension of our array\n",
    "    image[:, :, b] = band.ReadAsArray()\n",
    "\n",
    "print(image)\n",
    "print(image.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One minor tweak we can make is to ensure that we read our GDAL image into a NumPy array of a matching datatype. GDAL has a function which can make this `GDAL` <-> `NumPy` translation for us:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dataset.GetRasterBand(1).DataType"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from osgeo import gdal_array\n",
    "\n",
    "# DataType is a property of the individual raster bands\n",
    "image_datatype = dataset.GetRasterBand(1).DataType\n",
    "\n",
    "# Allocate our array, but in a more efficient way\n",
    "image_correct = np.zeros((dataset.RasterYSize, dataset.RasterXSize, dataset.RasterCount),\n",
    "                 dtype=gdal_array.GDALTypeCodeToNumericTypeCode(image_datatype))\n",
    "\n",
    "# Loop over all bands in dataset\n",
    "for b in range(dataset.RasterCount):\n",
    "    # Remember, GDAL index is on 1, but Python is on 0 -- so we add 1 for our GDAL calls\n",
    "    band = dataset.GetRasterBand(b + 1)\n",
    "    \n",
    "    # Read in the band's data into the third dimension of our array\n",
    "    image_correct[:, :, b] = band.ReadAsArray()\n",
    "\n",
    "print(\"Compare datatypes: \")\n",
    "print(\"    when unspecified: {dt}\".format(dt=image.dtype))\n",
    "print(\"    when specified: {dt}\".format(dt=image_correct.dtype))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Much more efficiently done this way -- we're saving 4x the memory!\n",
    "\n",
    "As you proceed into the next chapter, the last key concept to understand is how to de-allocate memory within Python.\n",
    "\n",
    "In order to close out your GDAL datasets and to signal that your NumPy arrays can be de-allocated, you can set them to `None`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dataset = None\n",
    "\n",
    "image = None\n",
    "image_correct = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next chapter (link to [webpage](chapter_2_indices.html) or [Notebook](chapter_2_indices.ipynb)) puts these lessons to use in order to calculate the Normalized Difference Vegetation Index (NDVI)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
